Method and Device for Denoising Videos Based on Non-Local Means

ABSTRACT

Disclosed is a method and a device for denoising a video based on non-local means, which is capable of making self-adaptive adjustment in response to illumination variance of the frame in the video.

TECHNICAL FIELD

The present application relates to a method and a device for denoising videos based on non-local means.

BACKGROUND

Important prior information is provided by the time domain correlation between image sequences. In the denoising research of digital video, in order to make rational use of the time domain information of image sequences, Buades et al. proposed a filtering method based on non-local means. Pursuant to the concept of non-local means, image blocks having similar structure may be far from each other in an image space, but it is possible to find out blocks of structures similar to a present image block in the whole image space to get a weighted average of the blocks, thereby removing pixel noises of a center point of the present image block.

However, the inventors find that the conventional methods based on non-local means directly use image pixels as local image structure features, in this way it is impossible to make self-adaptive adjustment in response to changes of illumination conditions so that a large portion of matching image blocks having similar structure may be overlooked.

SUMMARY

The present disclosure intends to provide a method and a device for denoising videos based on non-local means to solve the foregoing problems.

According to an embodiment of the present application, a local image block centered on a pixel i in a present frame of the video is constructed and then a search area centered on the pixel is delimited as a search window. For a pixel j in each of remaining frames of the video, the respective search window is delimited in same way as the present frame, wherein the constructed search windows constitute a three-dimensional search space. It carries out, in reference to the search window of the present frame, a histogram normalization filtering process of the search windows of remaining frames, so as to obtain a three-dimensional search space with illumination invariance.

BRIEF DESCRIPTION OF THE DRAWING

The drawings described herein are used to provide a further understanding to the present application and constitute a part of this specification. Exemplary embodiments of the present application and their descriptions serve to explain the present application and do not constitute improper limitation on the present application. In the drawings:

FIG. 1 depicts a schematic constructive diagram of local image blocks, search windows and a search space according to an embodiment of the present application.

FIG. 2 depicts a schematic diagram of a histogram normalization filtering process of a three-dimensional search space according to an embodiment of the present application.

FIG. 3 depicts a flowchart of a video denoising process according to an embodiment of the present application.

FIG. 4 depicts a schematic diagram of image matching and similarity weight calculation in the filtered three-dimensional search space according to an embodiment of the present application.

FIG. 5 depicts a schematic diagram of filtering module according to an embodiment of the present application.

DETAILED DESCRIPTION

Hereinafter, the present application will be explained in detail with reference to the accompanying drawings in connection with the embodiments.

In an embodiment of the present application, a method for denoising videos based on non-local means is provided. The method may remove effects from illumination variances in a video by means of image histogram normalization filtering processes.

In related arts, methods based on non-local means implicitly assume that the consistency of lighting conditions is maintained between image sequences having similar local structure. That is, these conventional methods ignored the influence of possible changes of lighting conditions, such as the effect of the flash lamp, during the photography process. Under different lighting conditions, the similar local structures of the images may not be able to be matched very well. Therefore, in the methods of related arts, it is impossible to make self-adaptive adjustment in response to illumination variances by means of direct use of pixel values as local image structure features, thereby a large portion of matching image blocks having similar structure may be overlooked.

An embodiment of the present application is proposed in light of the situation where the existing lighting conditions in video image sequences are changed. To be specific, it improves the robustness of searching similar image blocks among frames and computing similarity weights against the changes of lighting conditions by means of image histogram normalization filtering, and thus it denoises the videos based on non-local means with the constant lighting. This embodiment of the present application fills the blanks in video denoising methods against lighting conditions, and may better remove the noise in video sequences, in particular in the case that during shooting videos lighting conditions, such as the effect of flash lamp, change, better improve the robustness of the conventional methods for denoising videos and the visual quality of denoised image.

FIG. 3 depicts a flowchart of a video denoising process according to an embodiment of the present application. Preferably, the process of removing illumination variances in a video by means of image histogram normalization filtering may comprise the following steps.

In step S10, a local image block centered on a pixel to be processed in a present frame is constructed and a search area centered on the pixel is delimited as a search window. Similarly, the corresponding search windows in corresponding areas of other frames are established within a three-dimensional search space, wherein the search windows constitute the three-dimensional search space.

In step S20, in reference to or based on the search window of the present frame, a histogram normalization filtering process is carried out on the corresponding search windows of the other frames within the three-dimensional search space to obtain a three-dimensional search space with illumination invariance.

The preferred embodiment takes into account the influence on the global and local brightness of the image from the changes of lighting conditions on frame sequences, and eliminate matching errors of image blocks having very similar structure, which are introduced due to the changes of lighting conditions during matching, by using image histogram normalization filtering when the three-dimensional search window is constructed, such that the robustness and accuracy of similarity weight calculation and similarity matching of image blocks are effectively improved, and thus the performance of overall video denoising will be further improved.

FIG. 1 depicts a schematic constructive diagram of local image blocks, search windows and search space according to an embodiment of the present application, FIG. 2 depicts a schematic diagram of histogram normalization filtering a three-dimensional search space according to an embodiment of the present application. As shown in FIG. 2, a local image block centered on a pixel i of a frame f_(n) is constructed first. And then a search window w_(n) centered on the pixel i is constructed, wherein the search window w_(n) is larger than a range of the local image block. Search windows at corresponding positions on each of k frames ahead and behind of frame f_(n) are also constructed so as to obtain a set of search windows {W_(n−k), . . . , w_(n−1), w_(n), w_(n+1), . . . , w_(n+k)} which constitute a three-dimensional search space W.

For example, in step S20 the image value νout_(j) of the pixel j after filtering is calculated as follows:

νout_(j) =G ⁻¹(T(νin_(j))),

where, νin_(j) are values of respective pixel j in the corresponding search windows of the other frames within the three-dimensional search space,

νout_(j) is the image value of the pixel j after filtering,

T(ν_(in))=∫₀ ^(ν) ^(in) p_(in)(w)dw, p_(in)(w) is a probability density of a histogram of νin_(j) distributed at the luminance level of w,

G⁻¹ is an inverse function of function G, wherein G(T(ν_(in)))=∫₀ ^(T(ν) ^(in) )p_(out)(t)dt, p_(out)(t) is a probability density of a histogram of νout_(j) distributed at the luminance level of t.

The action of this step substantially eliminates the influence of the illumination variances, so a three-dimensional search space with illumination invariance is obtained.

In step S30, a similarity weight between the pixels i and j is determined by calculating structural differences between the local image block of f_(n) and all the local image blocks in W. FIG. 4 depicts a schematic diagram of image matching and similarity weight calculation in the filtered three-dimensional search space according to an embodiment of the present application.

In particular, in step S30 the similarity weight ω(i,j) is calculated as follows:

${\omega \left( {,j} \right)} = {\frac{1}{Z()}{\exp \left( {- \frac{{{{v\left( B_{i} \right)} - {v\left( B_{j} \right)}}}_{2,a}^{2}}{h^{2}}} \right)}}$

where, ω(i,j) is the similarity weight between the pixels i and j,

B_(i), B_(j) represent local image blocks centered on the pixels i and j,

ν(B_(i)) and ν(B_(j)) represent vectors constituted by the values of pixels in local image blocks,

∥•∥_(2,a) ² indicates the weighted Euclidean Distance between two vectors, in which symbol a means a spatial weight distribution which conforms to a Gaussian Distribution with its variance of a,

exp represents an exponential function,

${{Z()} = {\sum\limits_{j \in W}{\exp \left( {- \frac{{{{v\left( B_{i} \right)} - {v\left( B_{j} \right)}}}_{2,a}^{2}}{h^{2}}} \right)}}},$

h is a designated constant and may be optimized according to different videos for controlling the attributes of weight calculation.

In step S40, according to the similarity weight, the pixel i is denoised based on non-local means, for example, by rule of

${{{NL}\left\lbrack {v()} \right\rbrack} = {\sum\limits_{j \in W_{f}}{{\omega \left( {,j} \right)}{v(j)}}}},$

where NL[ν(i)] is the replaced image value of pixel i.

In this preferred embodiment, by denoising each pixel in the original noised image sequence in a designated order, a noise-removed image sequence can eventually be obtained and the denoising process of the whole video is completed.

In an embodiment of the present application, a device for denoising a video based on non-local means is provided. The device may include a filtering module 200 for removing illumination variances in a video by means of image histogram normalization filtering. Because this device may remove illumination variances in a video by means of image histogram normalization filtering, so it could make self-adaptive adjustment in response to the changes of lighting conditions.

As shown in FIG. 5, the filtering module 200 may comprise a space module 201 and a histogram module 202. The space module 201 is configured to construct a local image block which centered on a pixel to be processed in a present frame, to delimit a search area centered on the pixel as a search window, and to establish corresponding search windows in corresponding areas of other frames within a three-dimensional search space, wherein the search windows constitute the three-dimensional search space. The histogram module 202 is configured to carry out a histogram normalization filtering process for the corresponding search windows of the other frames within the three-dimensional search space, so as to obtain a three-dimensional search space with illumination invariance, which is based on or in reference to the search window of the present frame.

Preferably, the space module 201 is used for constructing a local image block centered on a pixel i of a frame f_(n), constructing a search window w_(n) centered on the pixel i which is larger than a range of the local image block, while constructing one search window at corresponding position on each of k frames ahead of and behind the frame f_(n), obtaining a set of search windows {w_(n−k), . . . , w_(n−1), w_(n), w₊₁, . . . , w_(+k)} which constitute a three-dimensional search space W.

The histogram module 202 is used to carry out the histogram normalization filtering process by rule of νout_(j)=G⁻¹(T(νin_(j))),

where, νin_(j) are image values of respective pixel j in the corresponding search windows of the other frames within the three-dimensional search space,

νout_(j) is the image value of the pixel j after filtering,

T(ν_(in))=∫₀ ^(ν) ^(in) p_(in)(w)dw, p_(in)(w) is a probability density of a histogram of νin_(j) distributed at the luminance level of w,

G⁻¹ is an inverse function of function G, wherein G(T(ν_(in)))=∫₀ ^(T(ν) ^(in) )p_(out)(t)dt, p_(out)(t) is a probability density of a histogram of νout_(j) distributed at the luminance level of t.

The present application may make better use of the time domain consistency between video image sequences and be less influenced by the changes of lighting conditions by means of image histogram normalization filtering, thus the recovering quality and the robustness of video denoising are improved.

It will be readily apparent to those skilled in the art that the modules or steps of the present application may be implemented with a common computing device. In addition, the modules or steps of the present application can be concentrated or run in a single computing device or distributed in a network composed of multiple computing devices. Optionally, the modules or steps may be achieved by using codes of the executable program, so that they can be stored in the storage medium and be run by one or more processors in the device, or the plurality of the modules or steps can be fabricated into an individual integrated circuit module. Therefore, the present application is not limited to any particular hardware, software or combination thereof.

The foregoing is only preferred embodiments of the present application, and it is not intended to limit the present application. Moreover, it will be apparent to those skilled in the art that various modifications and variations can be made to the present application. Thus, any modifications, equivalent substitutions, improvements etc. within the spirit and principle of the present application should be included within the scope of protection of the application. 

What is claimed is:
 1. A computer-implemented method for denoising a video based on non-local means, comprising: constructing a local image block centered on a pixel i in a present frame of the video; delimiting a search area centered on the pixel as a search window; constructing respective search window for a pixel j in each of remaining frames of the video in same way as the present frame, wherein the constructed search windows constitute a three-dimensional search space, and carrying out, in reference to the search window of the present frame, a histogram normalization filtering process of the search windows of the remaining frames, so as to obtain a three-dimensional search space with illumination invariance.
 2. The method of claim 1, wherein the step of delimiting further comprises: constructing the search window centered on the pixel in a present frame, which is larger than a range of the local image block.
 3. The method of claim 2, wherein the step of carrying further comprises: setting νout_(j)=G⁻¹(T(νin_(j))) where, νin_(j) is the image value of the pixel j before the filtering, νout_(j) is the image value of the pixel j after filtering, T(ν_(in))=∫₀ ^(ν) ^(in) p_(in)(w)dw, p_(in)(w) is a probability density of a histogram of νin_(j) distributed at a luminance level of w, G⁻¹ is an inverse function of function G, wherein G(T(ν_(in)))=∫₀ ^(T(ν) ^(in) )p_(out)(t)dt, p_(out)(t) is a probability density of a histogram of νout_(j) distributed at a luminance level of t.
 4. The method of claim 3, further comprising: determining a similarity weight between the pixel i and the pixel j by calculating structural differences between the local image block of the present frame and all other local image blocks in the three-dimensional search space, and denoising the pixel i based on non-local means according to the determined similarity weight.
 5. The method of claim 4, wherein the step of determining a similarity weight is carried out by rule of: ${\omega \left( {,j} \right)} = {\frac{1}{Z()}{\exp \left( {- \frac{{{{v\left( B_{i} \right)} - {v\left( B_{j} \right)}}}_{2,a}^{2}}{h^{2}}} \right)}}$ where, ω(i,j) is the similarity weight between the pixels i and j, B_(i) and B_(j) represent local image blocks centered on the pixels i and j, respectively, ν(B_(i)) and ν(B_(j)) represent vectors constituted by values of pixels in local image blocks B_(i) and B_(j) corresponding to the pixels i and j, respectively, ∥•∥_(2,a) ² indicates the weighted Euclidean Distance between two vectors ν(B_(i)) and ν(B_(j)), in which symbol a means a spatial weight distribution which conforms to a Gaussian Distribution with its variance of a, exp represents an exponential function, ${{Z()} = {\sum\limits_{j \in W}{\exp \left( {- \frac{{{{v\left( B_{i} \right)} - {v\left( B_{j} \right)}}}_{2,a}^{2}}{h^{2}}} \right)}}},$ h is a designated constant and is optimized according to different videos for controlling attributes of weight calculation.
 6. The method of claim 5, wherein the step of denoising the pixel i based on non-local means according to the determined similarity weight is carried out by rule of: ${{{NL}\left\lbrack {v()} \right\rbrack} = {\sum\limits_{j \in W_{f}}{{\omega \left( {,j} \right)}{v(j)}}}},$ where, NL[ν(i)] is a replaced image value of the pixel i, and W_(f) represents the three-dimensional search space.
 7. A device for denoising a video based on non-local means, comprising one or more processor configured to run a space module to construct a local image block centered on a pixel i in a present frame of the video; delimit a search area centered on the pixel as a search window; construct respective search window for a pixel j in each of remaining frames of the video in same way as the present frame, wherein the constructed search windows constitute a three-dimensional search space, and the processor is further configured to run a histogram module to carry out, in reference to the search window of the present frame, a histogram normalization filtering process of the search windows of the remaining frames, so as to obtain a three-dimensional search space with illumination invariance.
 8. The device of claim 7, wherein the space module is configured to construct the search window centered on the pixel in a present frame, which is larger than a range of the local image block.
 9. The device of claim 7, wherein the histogram module is configured to carry out the histogram normalization filtering process by rule of νout_(j)=G⁻¹(T(νin_(j))), where, νin_(j) is the image value of the pixel j before the filtering, νout_(j) is the image value of the pixel j after filtering, T(ν_(in))=∫₀ ^(ν) ^(in) p_(in)(w)dw, p_(in)(w) is a probability density of a histogram of νin_(j) distributed at a luminance level of w, G⁻¹ is an inverse function of function G, wherein G(T(ν_(in)))=∫₀ ^(T(ν) ^(in) )p_(out)(t)dt, p_(out)(t) is a probability density of a histogram of νout_(j) distributed at a luminance level of t.
 10. The device of claim 9, wherein the histogram module is configured to determine a similarity weight between the pixel i and the pixel j by calculating structural differences between the local image block of the present frame and all other local image blocks in the three-dimensional search space, and denoises the pixel i based on non-local means according to the determined similarity weight.
 11. The device of claim 10, wherein the histogram module is configured to determine the similarity weight by rule of: ${\omega \left( {,j} \right)} = {\frac{1}{Z()}{\exp \left( {- \frac{{{{v\left( B_{i} \right)} - {v\left( B_{j} \right)}}}_{2,a}^{2}}{h^{2}}} \right)}}$ where, ω(i,j) is the similarity weight between the pixels i and j, B_(i) and B_(j) represent local image blocks centered on the pixels i and j, respectively, ν(B_(i)) and ν(B_(j)) represent vectors constituted by values of pixels in local image blocks B_(i) and B_(j) corresponding to the pixels i and j, respectively, ∥•∥_(2,a) ² indicates the weighted Euclidean Distance between two vectors ν(B_(i)) and ν(B_(j)), in which symbol a means a spatial weight distribution which conforms to a Gaussian Distribution with its variance of a, exp represents an exponential function, ${{Z()} = {\sum\limits_{j \in W}{\exp \left( {- \frac{{{{v\left( B_{i} \right)} - {v\left( B_{j} \right)}}}_{2,a}^{2}}{h^{2}}} \right)}}},$ h is a designated constant and is optimized according to different videos for controlling attributes of weight calculation.
 12. The device of claim 10, wherein the histogram module is configured to denoise the pixel i based on non-local means by rule of: ${{NL}\left\lbrack {v()} \right\rbrack} = {\sum\limits_{j \in W_{f}}{{\omega \left( {,j} \right)}{v(j)}}}$ where, NL[ν(i)] is a replaced image value of the pixel i, and W_(f) represents the three-dimensional search space. 